The Scarr-Rowe effect

We are going to use the Scarr-Rowe effect (or hypothesis) to look more closely on interaction effects. The Scarr-Rowe hypothesis states that the (genetic) heritability of a trait depends on the environment, such that the effects of genes is larger when environments are better. The underlying idea is that if everyone lives in a perfect environment, i.e. there is no variation in the relevant environment, then a trait will only depend on genes.

This interaction can be visualized as follows:

par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
heritability = c(scarce = 0.5, plentiful = 0.8)
barplot(heritability,
        ylim = c(0,1),
        xlab = "environment",
        ylab = "heritability (effect of genes)")

Here is a DAG that describes such a model, where

and the Scarr-Rowe effect means the the coefficient of the path \(\small A \rightarrow IQ\) depends on \(\small SES\).

library(dagitty) 
dag = dagitty(
  "dag{
  IQ;
  A -> IQ;
  SES -> IQ;
  }")
coord.list = 
  list(
    x=c(A=0,SES=0,IQ=1),
    y=c(A=-.5,SES=.5,IQ=0))
coordinates(dag) = coord.list
drawdag(dag, cex = 1.5)

Note that DAGs do not encode interaction effects by drawing an arrow from the moderator to the relevant path. Instead, there is a path from the moderator to the outcome variable. So the DAG only tells us that IQ is a function of two other variables \(\small IQ = f(A,SES)\), but it does not tell us what the function \(f()\) is.

This function could be

or any other imaginable function that uses the variables. The model we have in mind now is:

\[ IQ = f(SES) A \]

Lets simulate data that show the Scarr-Rowe effect, first our exogenous variables. To keep things simple, we assume that all variables are close to normally distributed:

set.seed(25)
N = 250
SES = rlnorm(N,log(4.75),.3)
A = rlnorm(N,log(10),.2)

Now comes the interesting part: Scarr-Rowe assumes that the effect or weight of genes depends on the SES. So we formulate the weight of genes as as a function of SES.

library(boot)
# use inv.logit to give weights a lower and upper bound
# add a constant one to model that even at lowest SES 
# there are genetic effects
b_A = function(x) 0.5 + inv.logit(x-5) * 2.5

We are literally defining the weight for A as a function of SES. This is one way to understand interactions. Here is a visualization of the weights, with a histogram of SES values on the bottom.

par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
curve(b_A(x),0,10, 
      ylab = expression("effect size"~beta[A]), 
      ylim = c(0,b_A(10)), xlab = "SES")
hist(SES,add = T, probability = T)

In this plot, the interaction is encoded by the fact that \(\small \beta_A\) has a slope.

Now we can simulate IQ values using exogenous variables and derived weights. We assume that IQ depends on \(\small A\), whereby this effect depends on \(\small SES\): \(\small IQ = f(SES)A\). Lets simulate data from this model:

# Adding 70 to get an IQ around 100
IQ = 85 + b_A(SES)*A + rnorm(N,sd = 10)
par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
hist(IQ)

If we just look at the bivariate associations between \(\small A\) or \(\small SES\) and \(\small IQ\), we get the following plot:

par(mfrow = c(1,2),mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(SES,IQ, pch = 16, cex = .5,)
plot(A,IQ, pch = 16, cex = .5,)

Categorial interactions

To run a simple interaction with a categorical interaction variable we dichotomise the SES variable to create the variable SES.c:

low.SES = which(SES < quantile(SES,.50))
high.SES = which(SES > quantile(SES,.50))
idx = c(low.SES,high.SES)
dt = data.frame(
  A = A[idx],
  IQ = IQ[idx],
  SES = SES[idx],
  SES.c = c(rep(1,length(low.SES)),
            rep(2,length(high.SES))))

Lets plot the association between \(\small A\) and \(\small IQ\) again, this time split by SES:

low.SES = dt$SES.c == 1
high.SES = dt$SES.c == 2
plot_data = function(dt) {
  par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
  plot(IQ ~ A, data = dt, col = dt$SES.c, pch = 16, cex = .5,
       ylim = range(dt$IQ), ylab = "IQ", xlab = "A")
  legend("topleft", pch = 16, col = c("black","red"),
         legend = c("low SES","high SES"), bty = "n")
  
}
plot_data(dt)

#abline(lm(IQ ~ A, data = dt[low.SES,]), col = "black")
#abline(lm(IQ ~ A, data = dt[high.SES,]), col = "red")

Next, we estimate some linear regression models with increasing complexity, our goal being to have a model that accurately depicts the effect of A, C, E, and SES on the IQ.

Simple linear regression

We start with a simple linear regression. But first some intuitions on priors:

  • Eyeballing the data, we see that the IQ at an average \(\small A\) of 10 is around 100, so we use a prior a ~ dnorm(100,5)
  • The range of \(\small A\) is 16-6 = 10 and the range of IQ = 120-80 = 40. So for a one unit increase of \(\small A\), IQ changes around 40/10 = 4. If we want that an effect of +/-4 is at 1 sd of the prior, we set the prior for the effect of \(\small A\) to a ~ dnorm(0,4). This prior allows for the possibility of a negative effect of \(\small A\).
  • Lacking a strong intuition for the error variance, we set the prior for the variance to a generous dexp(0.25)

Here is the regression model:

set.seed(1)
IQ.A = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + b*(A - 10),
      a ~ dnorm(100,5),
      b ~ dnorm(0,4),
      sigma ~ dexp(.5)
    ),
    data=dt)

Lets quickly look at the prior predictions to make sure the piors are OK.

prior = extract.prior(IQ.A)
A.seq = seq(from=0,to=20,length.out=50)
mu = link(
  IQ.A,
  post=prior,
  data=data.frame(A=A.seq))

par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(
  0,type = "n", ylab = "IQ", xlab = "A", 
  xlim = c(min(A),max(A)),
  ylim = c(70,130))
matlines(
  A.seq,t(mu[1:100,]),type = "l", lty = 1,
  col = adjustcolor("blue", alpha = .5))

Yes, this looks good.

Here are the posterior predictions:

plot_data(dt)
plot_mu.CIs(IQ.A, data.frame(A=A.seq), "blue", spaghetti = TRUE)

This looks generally fine, but we are not capturing the group effects.

Linear regression with category-main effect

To fit a model with a main effect of education, we use the indexing approach:

set.seed(1)
IQ.A_SES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a[SES.c] + b*(A - 10),
      a[SES.c] ~ dnorm(100,5),
      b ~ dnorm(0,4),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot_data(dt)
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 2), "red")

We can see already from the plot the the model with separate means for high and low SES is better. But here is the comparison with PSIS-LOO:

compare(
  IQ.A,IQ.A_SES,
  func = "PSIS") %>% 
  round(2)
##             PSIS    SE dPSIS   dSE pPSIS weight
## IQ.A_SES 1874.50 22.63  0.00    NA  4.43      1
## IQ.A     1947.72 22.63 73.22 14.93  3.42      0

Linear regression with category-main effect and category-slope

Lastly, we can also let the slopes vary by SES:

IQ.AxSES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a[SES.c] + b[SES.c]*(A - 10),
      a[SES.c] ~ dnorm(100,5),
      b[SES.c] ~ dnorm(0,2),
      sigma ~ dexp(.5)
    ),
    data=dt)

The key part of this model is that we are not specifying a main effect for A and an interaction effect with SES, but that we are estimating 2 regression coefficients, one for high and one for low SES. This simplifies putting reasonable priors on the effect in each group, but also implies that we do not put a prior on the difference between the two groups.

And again the posterior predictions:

par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot_data(dt)
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 2), "red")

Even though the DGP does not have different “intercepts” for high and log SES, we have to add it, because our model puts the intercept at A = 10, whereas it was A = 0 in the DGP.

Here is a comparison of all the models we have fit:

compare(
  IQ.A, IQ.A_SES, IQ.AxSES,
  func = "PSIS") %>% 
  round(2)
##             PSIS    SE dPSIS   dSE pPSIS weight
## IQ.AxSES 1874.33 22.78  0.00    NA  5.49   0.56
## IQ.A_SES 1874.82 22.62  0.49  3.52  4.58   0.44
## IQ.A     1947.86 22.72 73.52 16.16  3.48   0.00
compare(
  IQ.A, IQ.A_SES, IQ.AxSES,
  func = "WAIC") %>% 
  round(2)
##             WAIC    SE dWAIC   dSE pWAIC weight
## IQ.A_SES 1874.76 22.59  0.00    NA  4.59   0.54
## IQ.AxSES 1875.10 22.76  0.34  3.57  5.89   0.46
## IQ.A     1947.55 22.66 72.80 14.96  3.30   0.00

While the correct model has the best PSIS and WAIC value, for the top two models the differences are not large compared to the SEs of the differences.

A simple contrast

How can we figure out if the difference in slopes is reliably larger than 0? We simply extract posteriors and calculate the difference in slopes from them:

# extract posterior
post = extract.samples(IQ.AxSES)
names(post)
## [1] "sigma" "a"     "b"
head(post$b)
##           [,1]       [,2]
## [1,] 0.8962989  1.5820929
## [2,] 0.2146403  1.2038705
## [3,] 0.6943730  1.3693562
## [4,] 0.3912184  1.1973200
## [5,] 1.3745367  1.8418082
## [6,] 1.2216063 -0.3121984

We simply calculate the differences of the two b parameters:

delta.b = post$b[,2]-post$b[,1]

And now we can show e.g. mean and PIs:

c(mean = mean(delta.b),
  PI(delta.b,prob = c(.9))) %>% 
  round(2)
##  mean    5%   95% 
##  0.91 -0.10  1.95

And we can plot a histogram of the contrast:

par(mar=c(3,3,2,1), mgp=c(1.25,.5,0), tck=-.01)
hist(delta.b, xlim = range(c(0,delta.b)))
abline(v = 0, lty = 2)
abline(v = PI(delta.b,prob = c(.95)), col = "red")

How about the DGP?

To see if our results are reasonable, lets compare our estimated parameters with the parameters governing the DGP. First the model parameters:

precis(IQ.AxSES, depth = 2) %>% round(2)
##         mean   sd   5.5%  94.5%
## a[1]   95.74 0.88  94.34  97.14
## a[2]  107.19 0.89 105.77 108.61
## b[1]    0.60 0.42  -0.08   1.27
## b[2]    1.50 0.46   0.78   2.23
## sigma   9.96 0.44   9.26  10.66

Now the parameters from the DGP:

rbind(
  mean(b_A(SES[SES<quantile(SES,.30)])), 
  mean(b_A(SES[SES>quantile(SES,.70)])))
##           [,1]
## [1,] 0.9331048
## [2,] 2.5280807

We are not recovering the exact parameters, after all we used a golem instead of a model that depicts the DGP, but qualitatively the results are consistent with the DGP.

Symetric interactions

Earlier, we described the function

\[ IQ = f(SES) A \]

By saying that the effect \(\small A\) is a function of \(\small SES\). However, we really just have two variables: \(\small A\) and \(\small f(SES)\) which are multiplied with each other. Therefore, these statements are equivalent:

  • The effect of \(\small A\) depends on \(\small f(SES)\)
  • The effect of \(\small f(SES)\) depends on \(\small A\)

Accordingly, we can also plot the difference between high and low \(\small SES\) as a function of \(\small A\):

A.seq = seq(from=5,to=16,length.out=50)
mu.low = link(IQ.AxSES,data=data.frame(SES.c=1,A=A.seq))
mu.high = link(IQ.AxSES,data=data.frame(SES.c=2,A=A.seq))
delta = mu.high-mu.low
CIs = apply(delta,2,PI)
par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(A.seq, colMeans(delta),'l',
     ylim = range(CIs),
     col = "blue",
     xlab = "A",
     ylab = expression(IQ[high~SES]~-~IQ[low~SES]))
shade(CIs,A.seq, col = adjustcolor("blue", alpha = .25))
abline(h = 0, lty = 2)

Continuous interactions

We are continuing with our simulated Scarr-Rowe data set, but this time we use the full data and formulate a continuous interaction.

dt = data.frame(
  A = A,
  IQ = IQ,
  SES = SES)

Plotting the data

You can try it the fancy way and make a 3d plot, but in this instance its a lot of effort for meager results:

library(plot3D)
points3D(A,SES,IQ, type = "h",
         col = "black",
         cex = .75,
         lty = 3,
         pch = 16,
         phi = 20,
         theta = 45,
         xlab = "A",
         ylab = "SES",
         zlab = "IQ")

A panel of 2d plots does the job better, and will later allow to display uncertainty.

qs = quantile(SES, probs = seq(0,1,.25))
par(mfrow = c(1,4), mar=c(2.5,2.5,2,.5), mgp=c(1.5,.5,0), tck=-.01)
for (k in 2:length(qs)) {
  idx = which(dt$SES > qs[k-1] & dt$SES < qs[k])
  tmp.dt = dt[idx,]
  with(tmp.dt,
       plot(IQ~A, pch = 16, main = paste0(k-1,". quartile SES"),
       ylim = range(dt$IQ), xlim = range(dt$A)))
}

How many panels one uses depends on how one assumes the moderator influences the effect of interest.

Simple linear regression without interaction

This first model assumes that there are just main effects of \(\small A\) and \(\small SES\). Indeed, this is not an unreasonable assumption if one remembers the raw data, which we show here again:

par(mfrow = c(1,2),mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(SES,IQ, pch = 16, cex = .5,)
plot(A,IQ, pch = 16, cex = .5,)

In preparation for the interaction model we are not centering \(\small SES\), but we are re-scaling it to have a minimum just above 0. If we would center to zero, below zero SES values would predict a negative additive genetic effect.

The simple linear model can be descried as follows (omitting indicators \(_i\) for individuals):

\[ \mu = \alpha + \beta_{A}A + \beta_{S}SES \]

IQc.A_SES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + ba*(A - 10) + bs*(SES - 2.5),
      a ~ dnorm(100,10),
      ba ~ dnorm(0,4),
      bs ~ dnorm(0,4),
      sigma ~ dexp(.5)
    ),
    data=dt)

We are using prior predictions to check if the model does a good job:

plot.pred(IQc.A_SES,dt, type = "prior")

This is pretty wild. Lets try a bit narrower priors:

set.seed(1)
IQc.A_SES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + ba*(A - 10) + bs*(SES - 2.5),
      a ~ dnorm(100,7.5),
      ba ~ dnorm(0,2),
      bs ~ dnorm(0,2),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot.pred(IQc.A_SES,dt, type = "prior")

These priors are reasonable, so we look at the predictions:

plot.pred(IQc.A_SES,dt)

As expected from the model we specified, all slopes are the same. Note that the different “intercepts” we seem to come from this part of the model: mu <- ... + bs*(SES - 2.5),.

Linear regression with main effects and interaction

If we want to model an interaction effect, we want to model that the effects of \(\small A\) and \(\small SES\) depend on each other. Broadly, we want to implement

\[ \gamma_A = f(SES) \]

if we assume the \(\small f(SES)\) is a linear function, we are saying that

\[ \gamma_A = \beta_A + \beta_{AS}SES \]

Here, \(\small \beta_A\) is the intercept for the effect of \(\small A\), i.e. the effect of \(\small A\) when \(\small SES = 0\). If knew that the effect of \(\small A\) has to be zero when \(\small SES = 0\), we could ommit the \(\small \beta_A\)

Now lets think back to our original regression, with slightly modified notations

\[ \mu = \alpha + \gamma_{A}A + \beta_{S}SES \]

We can just plug in \(\small (\beta_A + \beta_{AS}SES)\) in place of \(\small \gamma_a\):

\[ \mu = \alpha + (\beta_A + \beta_{AS}SES)A + \beta_{S}SES \]

and if we multiply out the brackets, we get

\[ \mu = \alpha + \beta_A A + \beta_{S}SES + \beta_{AS}SES \, A \]

which is the standard interaction model with 2 main effects.

The quap model is then similar to the previous, but adds this interaction term: bas*(A - 10)*(SES - 2.5).

IQc.AxSES.m = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + ba*(A - 10) + bs*(SES - 2.5) + bas*(A - 10)*(SES - 2.5),
      a ~ dnorm(100,7.5),
      ba ~ dnorm(0,2),
      bs ~ dnorm(0,2),
      bas ~ dnorm(0,1),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot.pred(IQc.AxSES.m,dt, type = "prior")

This priors look OK (even though slopes become extreme at high SES values), so we plot the model predictions:

plot.pred(IQc.AxSES.m,dt)

Model comparison

compare(IQc.A_SES,IQc.AxSES.m, func = "WAIC") %>% round(1)
##               WAIC   SE dWAIC dSE pWAIC weight
## IQc.AxSES.m 1876.9 23.3   0.0  NA   5.6      1
## IQc.A_SES   1882.8 23.3   5.9 6.4   4.8      0
compare(IQc.A_SES,IQc.AxSES.m, func = "PSIS") %>% round(1)
##               PSIS   SE dPSIS dSE pPSIS weight
## IQc.AxSES.m 1876.8 23.5   0.0  NA   5.6    0.9
## IQc.A_SES   1882.6 23.4   5.8 6.5   4.7    0.1

PSIS and WAIC correctly identify the interaction model, and the difference is clearer compared to the categorical interaction, which used only a subset of the data and through way information by dichotomizing.

A fancy plot

library(akima)
library(plotly)
mu = link(IQc.AxSES.m)
s = interp(dt$A,dt$SES,colMeans(mu))
names(s) = c("A","SES","IQ")
fig = with(
  s,
  plot_ly(x = ~A, y = ~SES, z = ~IQ,
          width = 900, height = 900) %>% 
    add_surface(
      contours = list(
        z = list(
          show=TRUE,
          usecolormap=TRUE,
          highlightcolor="#ff0000",
          project=list(z=TRUE)
        )
      )
    ) %>% 
    add_markers(x = dt$A, y = dt$SES, z = dt$IQ, size = .5)
)
fig = 
  fig %>% layout(
    scene = list(
      camera=list(
        eye = list(x=1.87, y=0.88, z=0.64)
      )
    )
  )

fig

Scarr-Rowe effect: Latest results

Giagrande 2019: Scarr-Rowe effect

Excersices